A Universal Scaling Theory for Complexity of Analog Computation 
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We discuss the computational complexity of solving linear programming problems by means of 
an analog computer. The latter is modeled by a dynamical system which converges to the optimal 
vertex solution. We analyze various probability ensembles of linear programming problems. For 
each one of these we obtain numerically the probability distribution functions of certain quantities 
which measure the complexity. Remarkably, in the asymptotic limit of very large problems, each 
of these probability distribution functions reduces to a universal scaling function, depending on a 
single scaling variable and independent of the details of its parent probability ensemble. These 
functions are reminiscent of the scaling functions familiar in the theory of phase transitions. The 
results reported here extend analytical and numerical results obtained recently for the Gaussian 
ensemble. 
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During the past two decades or so, physicists have 
been applying methods of statistical physics in studying 
hard combinatorial optimization problems of computer 
science Physical methods, borrowed from statistical 
physics and the theory of dynamical systems (DS) 0, 
have been applied recently in studying an easier com- 
puter science problem, namely, the computational com- 
plexity of an analog computation algorithm which solves 
linear programming (LP) problems 0, Q . In this Letter 
we demonstrate the robustness and universality of the 
results of 0,01 

To put things in broader context, we remark that 
analog computers are ubiquitious computational tools, 
alongside with their predominating digital counterparts. 
The most relevant examples of analog computers are 
VLSI devices implementing neural networks |5j, or neu- 
romorphic systems 0, whose structure is directly moti- 
vated by the workings of the brain. Various processes 
taking place in living cells can be considered as analog 
computation Q as well. 

Linear programming is a P-complete problem i.e. 
it is representative of all problems that can be solved in 
polynomial time. The standard form of LP is to find 



max{c 3 



x e R n ,Ax = b,x>0} 



(1) 



where c € M n , b S M m , A e M mxn and m < n. The set 
generated by the constraints in is a polyheder. If a 
bounded optimal solution exists, it is obtained at one of 
its vertices. The vector defining this optimal vertex can 
be decomposed (in an appropriate basis) in the form x — 
(xn,xb) where x_\f = is an n — in component vector, 
while xb = B~ x b > is an m component vector, and 
B is the to x m matrix whose columns are the columns 
of A with indices identical to the ones of xg. Similarly, 
we decompose A = (N, B), where N is an (77 — m) x m 
matrix. 

A DS whose flow converges to the optimal vertex, in- 
troduced by Faybusovich jfj, will be studied here. Its 



flow =| = F(x) is given in terms of the vector field 
F{x) = [X- XA T {AXA T )- 1 AX] c , 



(2) 



where X is the diagonal matrix Diag(xi . . .x n ). Geo- 
metrically, F is the projection of the gradient of the cost 
function c T x onto the constraint set, relative to a Rie- 
mannian metric, which enforces the positivity constraints 

x>oi. 

This DS, as it evolves in its continuous phase space in 
continuous time, models the analog computer in question, 
which solves the given LP problem. Other dynamical sys- 
tems are known (also described by ordinary differential 
equations) that are used to solve various computational 
problems Thus, a large set of analytical tools 

and physical intuition, developed for dynamical systems, 
turns out to be applicable to the analysis of analog com- 
puters. 

In contrast, the evolution of a digital computer is de- 
scribed by a dynamical system, discrete both in its phase 
space and in time. Consequently, the standard theory 
of computation and computational complexity |8| deals 
with computation in discrete time and phase space, and 
is inadequate for the description of analog computers. 
The analysis of computation by analog devices requiers a 
theory that is valid in continuous time and phase space. 

Since the systems in question are physical systems, the 
computation time is the time required for a system to 
reach the vicinity of an attractor (a stable fixed point 
in the present work) combined with the time required to 
verify that it indeed reached this vicinity. This time is the 
elapsed time measured by a clock, contrary to standard 
computation theory, where it is the number of discrete 
steps. 

In our model we assume we have a physical implemen- 
tation of the flow equation. Thus, the vector field F 
need not be computed, and the computation time is de- 
termined by the convergence time to the attracting fixed 
point. In other words, the time of flow to the vicinity 
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of the attractor is a good measure of complexity, namely 
the computational effort, for the class of continuous dy- 
namical systems introduced above [l2|. 
In this work, following 

0,0 (and in a manner similar 
to E0), the complexity will be evaluated probabilis- 
tically for an ensemble of LP problems. In this way the 
worst case scenarios, studied traditionally in computer 
science, will be ignored, since their probability measure 
vanishes. 

The main result of 0, Q , in which LP problems were 
drawn from the Gaussian distribution of the parameters 
of F (namely, the constraints and cost function in 0J), 
was that the distribution functions of various quanti- 
ties that characterize the computational complexity, were 
found to be scaling functions in the limit of LP problems 
of large size. In particular, it was found that those dis- 
tribution functions depended on the various parameters 
only via specific combinations, namely, the scaling vari- 
ables. Such behavior is analogous to the situation found 
for the central limit theorem, for critical phenomena fl5| 
and for Anderson localization 0], in spite of the very 
different nature of these problems. It was demonstrated 
m HI now for the implementation of the LP prob- 
lem on a physical device, methods used in theoretical 
physics enable to describe the distribution of computa- 
tion times in a simple and physically transparent form. 
Based on experience with certain universality properties 
of rectangular and chiral random matrix models, it was 
conjectured in 0, 0] that some universality for compu- 
tational problems should be expected and should be ex- 
plored. That is, the scaling properties that were found 
for the Gaussian distributions should hold also for other 
distributions. In particular, some specific questions were 
raised in 0, : Is the Gaussian nature of the ensemble 
unimportant? Arc there universality classes |15| of ana- 
log computational problems, and if they exist, what are 
they? In the present work we extend the earlier analysis 
0, l! of the Gaussian distribution to other probability 
distributions, and demonstrate numerically that the dis- 
tribution functions of various quantities that characterize 
the computational complexity of the analog computer, 
which solves LP problems, are indeed universal scaling 
functions. They depend upon the original probability 
ensemble of inputs only via the scaling variables, that 
are proportional to the ones found for the Gaussian dis- 
tribution. 

The distribution of constraints and cost function of 
the LP problems that are used in practice is not known. 
Therefore, the universality of the distribution functions 
of the computation time and other quantities related 
to computational complexity is of great importance. It 
would imply that distributions found for the idealized 
systems may be relevant also for the realistic sets of LP 
problems. In this paper we demonstrate numerically that 
for several probability distributions of LP problems, the 
functions of the quantities that characterize the complex- 



ity are indeed universal, providing support for the con- 
jecture that universality holds in general. 

It was shown in that the flow equations corre- 
sponding to (J2J are, in fact, part of a system of Hamilto- 
nian equations of motion of a completely integrable sys- 
tem of a Toda type. Therefore, like the Toda system, it 
is integrable with the formal solution [t| 



i(t) =z i (0)exp -Ait 



/ , ^3 
.7=1 



CCji log 



X 



j+n — m 



(*) 



Lj-\-n — rn 



(0) 



(3) 

(i = 1, . . . , n— m), that describes the time evolution of the 
n — m independent variables xj^(t), in terms of the vari- 
ables x&(t). In 121 Xi(0) and x J+ „_ m (0) are components 
of the initial condition, Xj+„_ m (i) are the components 
of the solution, ctji = —{B~ 1 N)ji is an m x (n — m) ma- 
trix, while 



3=1 



(4) 



For the decomposition x — (x_\f, xb) used for the optimal 
vertex, A^ > i = 1, . . . , n — m, and xj^(t) converges 
to 0, while xig(t) converges to x* = B~ 1 b. Note that 
the analytical solution is only a formal one, and does 
not provide an answer to the LP instance, since the 
depend on the partition of A, and only relative to a par- 
tition corresponding to a maximum vertex are all the 
positive. 

The second term in (|3J), when it is positive, is a kind 
of "barrier" : Ait must be larger than the barrier before 
Xi can decrease to zero. In the following we ignore the 
contribution of the initial condition and denote the value 
of this term in the infinite time limit by 



'J+n— m " 



(5) 



.7 = 1 



In order for x(t) to be close to the maximum vertex we 
must have Xi(t) < e for i = 1, . . . , n — m for some small 
positive e, namely exp(— Ait+(3i) < e , for i = 1, . . . , n — 
to. Therefore we consider 



T = max 



Pi , |loge| 



Ai Ai 
as the computation time. We denote 



A min = min A, 



max [3i 



(6) 



(7) 



The Ai can be arbitrarily small when the inputs are real 
numbers. Such "bad" instances (associated with the pos- 
sibility of vanishing x*j^~_ 's) are rare in the Gaussian 
probabilistic model of |3, where it was shown that 
typically A m ; n ~ 1/ \fm. In this Letter we will show that 
this scaling behavior of A m ; n is universal, being valid in 
a broad class of probability distributions of LP problems. 
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Consider an ensemble of LP problems in which the 
components of (A, b 7 c) are independent identically dis- 
tributed (i.i.d.) random variables taken from various 
even distributions, with mean and bounded variance. 
For a probabilistic model of LP instances, A m ; n , /3 max 
and T are random variables. 

In 0, H, the components of A, b, and c were taken 
from the Gaussian distribution (see, e.g., Eqs. (12-18) in 
with zero mean and variance a 1 . It was found an- 
alytically, in the large (n, m) limit, that the probability 
P(A mm < A\A mm > 0) = jF("' m )(A) is of the scaling 
form 



T {n ' m) (A) = 1 - erfc(a; A ) = F(x A ) 
with the scaling variable 



xa = oa {n/m) x' L 



a • 



where 

x'^ = y/mA and a a (n/m) = — -=( 1)— . 



(8) 



(9) 



(10) 



The scaling function T contains all asymptotic informa- 
tion on A. The distribution T{xa) is very wide and does 
not have a variance. Also the average of I/^a diverges. 

The amazing point is that in the limit of large m and 
n, the probability distributions of A m j n depend on the 
variables m, n and A only via the scaling variable xa- 
If the limit of infinite m and n is taken, so that n/m is 
fixed, a a is constant. It was verified numerically that 
for the Gaussian ensemble JHJ is a good approximation 1 
already for m = 20, and n — 40. 

The existence of scaling functions like 10 for the bar- 
rier P m ax (that is, the maximum of the (3i) defined by 
© , and for T defined by © (assuming that e is not too 
small so that the first term in © dominates), was verified 
numerically for the Gaussian distribution 0, 0] ■ 

The scaling behavior Q, and similar behavior found 
for \/(3 ma x and 1/T, rendering their distribution func- 
tions P/3(l/f3) and Py(l/T) scaling functions of the scal- 
ing variables xp and xt, all associated with the Gaus- 
sian ensemble, prompted us, for reasons discussed above, 
to explore their universality and check their validity for 
other probability ensembles of LP problems. Thus, we 
carried numerical calculations of the distribution func- 
tions of A, 0, and 1/T for various probability distribu- 
tions of A, 6, and c. 

To be specific, we studied: (1) the uniform distribu- 
tion, in which each entry of (^4, b, c) was uniformly dis- 
tributed between ±|; (2) the discrete, bimodal distri- 
bution, in which each entry was either +1 or -1 with 
probability \ each; and finally, (3) the diluted bimodal 
distribution, in which each entry was either +1 or -1 with 
probability | each, or with probability I — p. Here we 
chose p = 0.2, 0.5 in numerical calculations. 



For continuous distributions the probability of encoun- 
tering degenerate solutions (where some of the x* +n _ m 's 
in JSJ may vanish) is of measure zero, while for the dis- 
crete ensembles some regularization was introduced, in 
order to avoid degenerate situations. 

We generated full LP instances (A, b, c) with the prob- 
ability distribution in question. For each instance the LP 
problem was solved using the linear programming solver 
of MatLab. Only instances with a bounded optimal so- 
lution were kept, and A m i n was computed relative to the 
optimal partition and optimality was verified by checking 
that A m i n > 0. Using the sampled instances we obtain 
an estimate of 

jK«,m)(A) = P A (A) = V(A min < A\A min > 0) (11) 

and of the corresponding cumulative distribution func- 
tions of the barrier /9 ma x and the computation time T. 

The solution of the LP problem is used here in order to 
identify the optimal partition of A into B and N. This 
enables one to compute A m i n , I3 ma x, and T from (JSJ and 
0, and the distribution -Pa (A) as well as Pp{l/0) and 
P T (l/t). 

For example, the scaling behavior of (jl 1|) in the case of 
the uniform ensemble can be seen in Fig. ^ Similar scal- 
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FIG. 1: Pa is plotted as a function of x' A for the uniform 
distribution for LP problems with n = 2m. The number of 
instances used in the simulation is 121939 for the m — 20 case, 
91977 for the m = 30 case and 112206 for the m = 40 case. 
The number of converging instances for each case is 20000. 

ing behavior is found also for Pp and Pt for the uniform 
distribution. Scaling behavior of this nature was con- 
firmed also for the bimodal distribution, and preliminary 
results suggest that it holds also for the diluted bimodal 
distribution. 

A natural question which arises is whether the distri- 
bution functions Pa, Pp and Pt are universal 0,0]- In 
other words, do all probability ensembles of LP prob- 
lems, or at least a large family thereof, yield the same 
functions Pa, Pp and Pt of the scaling variables XA,xp 
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and xt? We found that the answer to this question is on 
the affirmative. 

Specifically for A, plots like Fig. Qwere produced for 
the variables x' A for the various distributions that were 
studied. Indeed, these were found to be scaling functions. 
Then, the scale factors a a, corresponding to (|1(J|I were 
calculated for the various distributions so that Pa as a 
function of a; a is the same function for all distributions. 
(This is done, as usual, by least squares fit.) Indeed, a 
universal function for Pa is found, as is clear from Fig. 
13 and it reduces to © that was found for the Gaussian 
ensemble in 0, Q . It turns out that all the scaling factors 
that were found in this way for the various distributions 
are in agreement with (in the way it depends on 
a). Similar scaling was found also for the distribution 
functions Pp and Pr of 1/Pmax and 1/T, respectively. 
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FIG. 2: Pa as a function of xa- Here n = 2m and m — 40. 
The scale factors oa are: 1.044/ v / 7r for the bimodal distribu- 
tion, 3.603/v^" for the uniform distribution, and 1/\/tt for the 
gaussian distribution. For the Gaussian and bimodal distribu- 
tions the variance is a 2 — 1 , while for the uniform distribution 
a 2 = 1/12. 

In summary, we find that the asymptotic distribution 
functions Pa, Pp and Pp are the same for all distributions 
of the parameters studied here, if expressed in terms of 
the scaling variables x&,xp and xp. Furthermore, we 
find that then the following combinations of scaling fac- 
tors, craA, up and aax are independent of the probability 
distribution. Therefore, in particular, should satisfy 
HI Op that was found for the Gaussian distribution, while 
ap and ot are yet to be found analytically. 

Based the results presented in this paper, as well as the 
results of 0,0], we conjecture that the scaling behavior 
of the various distribution functions (and the correspond- 
ing scaling factor) is universal, i.e., that it is robust and 



should be valid in a large class of ensembles of LP prob- 
lems, in which the (A, b, c) data are taken from even dis- 
tributions, with zero mean and finite variance. 
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